Coping with Growing Collections of Electronic Text

نویسندگان

  • David Hawking
  • Kerry Webb
چکیده

Despite the trend toward distributed information sources, future digital libraries may hold as much text in electronic form as current libraries do in print. Accessing such collections by content rather than by metadata will require search-engine technology to accommodate at least a hundred-fold growth in data size. Recent developments within the ACSys Cooperative Research Centre are described, including an e ective and cost-e ective retrieval system (PADRE) designed to scale to multi-terabyte levels, a very large test collection for use in retrieval evaluation, techniques for selecting information servers and combining their results and ideas for combining content searches with access by metadata. The authors wish to acknowledge that this work was carried out within the Cooperative Research Centre For Advanced Computational Systems established under the Australian Government's Cooperative Research Centre's Program.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interfaces to Support the Scholarly Exploration of Text Collections

The analysis of text collections forms the basis of scholarship in many disciplines in the humanities and social sciences. Despite the growing availability of electronic texts, automated techniques have not been effectively exploited to support the activities of scholars in these fields. We present a prototype search interface for exploring text collections that places equal emphasis on content...

متن کامل

A novel self-organising clustering model for time-event documents

Purpose Neural document clustering techniques, e.g., self-organising map (SOM) or growing neural gas (GNG), usually assume that textual information is stationary on the quantity. However, the quantity of text is ever-increasing. We propose a novel dynamic adaptive self-organising hybrid (DASH) model, which adapts to time-event news collections not only to the neural topological structure but al...

متن کامل

On the quality of ART1 text clustering

There is a large and continually growing quantity of electronic text available, which contain essential human and organization knowledge. An important research endeavor is to study and develop better ways to access this knowledge. Text clustering is a popular approach to automatically organize textual document collections by topics to help users find the information they need. Adaptive Resonanc...

متن کامل

Dual Spacization Approach to the Electronic Publishing

Dual spacization of publishing means emergence of digital publishing in online and offline virtual environments along with analogue publishing. Analogue publishing is a kind of publishing that is produced in the form of physical printed writings as they appear in a single paper, single or many pages newspapers and magazines and books, writings on leaves and pieces of trees, natural skin and lea...

متن کامل

Exploration of Full-text Databases with Self-organizing Maps

Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007